Skip to main content

Run Your First Pipeline

Get from zero to a complete dataset of 2,775 stocks with 86 fields each in under 10 minutes.
1

Navigate to Pipeline Directory

cd "~/workspace/source/DO NOT DELETE EDL PIPELINE"
The directory name contains spaces, so ensure you use quotes in your shell commands.
2

Run the Master Pipeline

python3 run_full_pipeline.py
This single command orchestrates 18 scripts across 6 phases:
Phase 1: Core Data (Foundation)
  • Fetches 2,775 NSE stocks
  • Creates master ISIN map
  • Downloads fundamental data (35 MB)
Phase 2: Data Enrichment
  • Company filings (hybrid LODR + Legacy)
  • Live announcements
  • Advanced technical indicators
  • Market news (50 articles/stock)
  • Corporate actions
  • Surveillance lists (ASM/GSM)
  • Circuit stocks
  • Bulk/Block deals
  • Price band revisions
Phase 2.5: OHLCV History (Optional)
  • Smart incremental download
  • Lifetime daily candles
  • First run: ~30 min | Incremental: ~2-5 min
Phase 3: Base Analysis
  • Builds master JSON with 60+ base fields
Phase 4: Enrichment (Sequential)
  1. Advanced metrics (ADR, RVOL, ATH, Turnover)
  2. Earnings performance (post-results returns)
  3. F&O data (lot sizes, expiry dates)
  4. Market breadth & relative strength
  5. Corporate events + news feed (LAST)
Phase 5: Compression
  • GZIP level 9 compression
  • 30 MB → 2-4 MB (85-90% reduction)
3

Monitor Pipeline Progress

Watch real-time progress with phase labels and timing:
═══════════════════════════════════════════════════════════════
  EDL PIPELINE - FULL DATA REFRESH
═══════════════════════════════════════════════════════════════

📦 PHASE 1: Core Data (Foundation)
────────────────────────────────────────
 Running fetch_dhan_data.py...
 fetch_dhan_data.py (12.3s)
 Running fetch_fundamental_data.py...
 fetch_fundamental_data.py (45.2s)
 Downloading NSE Listing Dates...
 NSE Listing Dates downloaded.

📡 PHASE 2: Data Enrichment (Fetching)
────────────────────────────────────────
 Running fetch_company_filings.py...
 fetch_company_filings.py (89.7s)
  ...

 PHASE 4: Enrichment (Injecting into Master JSON)
────────────────────────────────────────
 Running advanced_metrics_processor.py...
 advanced_metrics_processor.py (15.4s)
  ...

📦 PHASE 5: Compression (.json  .json.gz)
────────────────────────────────────────
  📦 Compressed: 32.4 MB 3.2 MB (90% reduction)

═══════════════════════════════════════════════════════════════
  PIPELINE COMPLETE
═══════════════════════════════════════════════════════════════
  Total Time:  285.7s (4.8 min)
  Successful:  18/18
  Failed:      0/18

  📄 Output: all_stocks_fundamental_analysis.json.gz (3.2 MB)
  🧹 Only .json.gz + ohlcv_data/ remain. All intermediate data purged.
═══════════════════════════════════════════════════════════════
4

Verify Output

Check that the compressed output file was created:
ls -lh all_stocks_fundamental_analysis.json.gz
Expected output:
-rw-r--r-- 1 user user 3.2M Mar 03 14:35 all_stocks_fundamental_analysis.json.gz
5

Extract and Inspect

Decompress and view a sample record:
gunzip -c all_stocks_fundamental_analysis.json.gz | python3 -m json.tool | head -n 100
Or use the built-in single stock analyzer:
python3 single_stock_analyzer.py
# Enter symbol when prompted: RELIANCE

Configuration Options

Customize pipeline behavior by editing run_full_pipeline.py (lines 60-71):
# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

# Auto-delete intermediate files after pipeline succeeds
# Keeps: all_stocks_fundamental_analysis.json.gz + ohlcv_data/
CLEANUP_INTERMEDIATE = True
OHLCV Impact: Setting FETCH_OHLCV = False will result in zero values for these fields:
  • ADR (Average Daily Range)
  • RVOL (Relative Volume)
  • ATH (All-Time High)
  • % from ATH
  • 200 Days EMA Volume
  • Returns since Earnings

Understanding the Output

The pipeline produces a JSON array with 2,775 stock objects:
[
  {
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Ltd.",
    "Listing Date": "29-Nov-1977",
    "Basic Industry": "Refineries",
    "Sector": "Energy",
    "Index": "NIFTY 50, NIFTY 500",
    
    // Fundamentals (35 fields)
    "Market Cap(Cr.)": 1725430.5,
    "Stock Price(₹)": 2567.85,
    "Latest Quarter": "Dec 2025",
    "Net Profit Latest Quarter(Cr.)": 18670.0,
    "EPS Latest Quarter": 27.65,
    "Sales Latest Quarter(Cr.)": 254890.0,
    "OPM Latest Quarter(%)": 12.4,
    // ... 25+ more fundamental fields
    
    // Valuation Ratios (10 fields)
    "P/E": 28.45,
    "Forward P/E": 24.12,
    "ROE(%)": 12.34,
    "ROCE(%)": 15.67,
    "D/E": 0.45,
    // ... 5+ more ratio fields
    
    // Technical Indicators (7 fields)
    "RSI (14)": 62.5,
    "SMA Status": "SMA 20: Above (4.9%) | SMA 50: Above (24.1%)",
    "EMA Status": "EMA 20: Above (6.3%) | EMA 200: Above (72.6%)",
    "Technical Sentiment": "RSI: Neutral | MACD: Bearish",
    "Pivot Point": 2545.50,
    
    // Price Performance (9 fields)
    "1 Day Returns(%)": 1.2,
    "1 Week Returns(%)": 3.4,
    "1 Month Returns(%)": 5.6,
    "1 Year Returns(%)": 24.5,
    "% from 52W High": -8.5,
    "% from ATH": -12.3,
    
    // Volume & Liquidity (6 fields)
    "RVOL": 1.45,
    "Daily Rupee Turnover 20(Cr.)": 850.5,
    "30 Days Average Rupee Volume(Cr.)": 890.2,
    
    // Volatility (4 fields)
    "5 Days MA ADR(%)": 2.3,
    "14 Days MA ADR(%)": 2.1,
    
    // Earnings Tracking (3 fields)
    "Quarterly Results Date": "14-Jan-2026",
    "Returns since Earnings(%)": 8.5,
    "Max Returns since Earnings(%)": 12.3,
    
    // Event Markers (multi-value string)
    "Event Markers": "📊: Results Recently Out | 💸: Dividend (15-Mar)",
    
    // Recent Announcements (array of objects)
    "Recent Announcements": [
      {
        "Date": "15-Jan-2026",
        "Headline": "Board Meeting - Consideration of Quarterly Results",
        "URL": "https://..."
      },
      // ... up to 5 items
    ],
    
    // News Feed (array of objects)
    "News Feed": [
      {
        "Title": "Reliance Industries Q3 profit beats estimates",
        "Sentiment": "positive",
        "Date": "15-Jan-2026 09:45"
      },
      // ... up to 5 items
    ]
  },
  // ... 2,774 more stocks
]

Common Use Cases

import json
import gzip

with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rt') as f:
    stocks = json.load(f)

# Find stocks with:
# - ROE > 15%
# - P/E < 25
# - Market Cap > 1000 Cr
# - 1 Year Returns > 20%

screened = [
    s for s in stocks
    if s.get('ROE(%)', 0) > 15
    and s.get('P/E', 999) < 25
    and s.get('Market Cap(Cr.)', 0) > 1000
    and s.get('1 Year Returns(%)', -999) > 20
]

print(f"Found {len(screened)} stocks matching criteria:")
for stock in screened[:10]:
    print(f"{stock['Symbol']:12} | ROE: {stock['ROE(%)']:5.1f}% | P/E: {stock['P/E']:5.1f}")
# Find all stocks with upcoming dividends
dividend_stocks = [
    s for s in stocks
    if s.get('Event Markers') and 'Dividend' in s['Event Markers']
]

print(f"{len(dividend_stocks)} stocks with upcoming dividends:")
for stock in dividend_stocks:
    print(f"{stock['Symbol']:12} | {stock['Event Markers']}")
# Find stocks with strong post-earnings momentum
strong_earnings = [
    s for s in stocks
    if s.get('Returns since Earnings(%)', 0) > 10
    and s.get('Max Returns since Earnings(%)', 0) > 15
]

# Sort by returns since earnings
strong_earnings.sort(key=lambda x: x['Returns since Earnings(%)'], reverse=True)

print(f"{len(strong_earnings)} stocks with >10% returns since earnings:")
for stock in strong_earnings[:20]:
    print(f"{stock['Symbol']:12} | Since: {stock['Returns since Earnings(%)']:6.1f}% | Max: {stock['Max Returns since Earnings(%)']:6.1f}%")
# Find stocks with:
# - RSI between 50-70 (momentum but not overbought)
# - Above SMA 50
# - RVOL > 1.5 (high volume)

breakout_candidates = []
for s in stocks:
    rsi = s.get('RSI (14)', 0)
    sma_status = s.get('SMA Status', '')
    rvol = s.get('RVOL', 0)
    
    if (50 < rsi < 70 
        and 'SMA 50: Above' in sma_status 
        and rvol > 1.5):
        breakout_candidates.append(s)

print(f"{len(breakout_candidates)} breakout candidates:")
for stock in breakout_candidates:
    print(f"{stock['Symbol']:12} | RSI: {stock['RSI (14)']:5.1f} | RVOL: {stock['RVOL']:4.2f}")

Next Steps

Pipeline Settings

Learn advanced configuration options

Field Reference

Explore all 86 output fields in detail

Pipeline Architecture

Understand the pipeline design

Data Fetching Scripts

See API reference for all scripts